itle: “Lab 04 - Non-linear regression on dependency trees”
uthor: “Francisco Javier Jurado, Roger Pujol Torramorell”
ate: “October 29, 2019”
utput: pdf_document
## Loading required package: knitr
## Loading required package: rstudioapi
## Loading required package: kableExtra

Results

On the validity of the inputs

Results

The first table in this results section summarizes the properties of the degree sequences, in particular the sample size and the mean and standard deviation of both the number of vertices \(n\) and the mean dependency length \(d\):

TODO: COMMENT THE TABLE Before going any further it is a good practice to check what the data looks like so let’s plot the mean depencency length \(d\) vs the number of vertices \(n\): We would like to check for any possible power-law dependencies and we can do so by plotting data taking logs on both axes: Note that rather than applying the log to the data itself we have set the plot axes to logarithmic, distributing the ticks in a log fashion. We can now observe that the plots suggest a power-law despite the large amount of dispersion.

A way to deal with this dispersion and get a clearer intuition on the underlying trend is to average the mean length for a given number of vertices: Although there is stil a significant amount of dispersion for larger values of \(n\), we have a way clearer view of the distribution shape. By plotting the same averaged points on log-log axes: The data points form an almost straight line in the log-log plot (again with dispersion when \(n\) gets large) so we have reasonable evidence to believe they follow a power-law distribution.

We now want to compare how far the real scaling of \(d\) is from the one existing at a random linear arrangement. For that purpose we can compare the points to the averaged ones and the expected mean length, given by \(E[\langle d \rangle] = (n+1)/3\). Plotting that in a regular and double logarithmic scale:

## [[1]]
## [[1]][[1]]
## [1] NA
## 
## [[1]][[2]]
##         b 
## 0.3291662 
## 
## [[1]][[3]]
##         a         b 
## 0.7381135 0.3500622 
## 
## [[1]][[4]]
##           a           c 
## 1.642763871 0.009674106 
## 
## [[1]][[5]]
##         a 
## 0.7324749 
## 
## 
## [[2]]
## [[2]][[1]]
## [1] NA
## 
## [[2]][[2]]
##         b 
## 0.4543944 
## 
## [[2]][[3]]
##        a        b 
## 0.958574 0.372823 
## 
## [[2]][[4]]
##          a          c 
## 2.19314540 0.01330892 
## 
## [[2]][[5]]
##        a 
## 1.000429 
## 
## 
## [[3]]
## [[3]][[1]]
## [1] NA
## 
## [[3]][[2]]
##         b 
## 0.4183132 
## 
## [[3]][[3]]
##         a         b 
## 0.6714799 0.4583909 
## 
## [[3]][[4]]
##          a          c 
## 1.35534413 0.03097325 
## 
## [[3]][[5]]
##         a 
## 0.8677413 
## 
## 
## [[4]]
## [[4]][[1]]
## [1] NA
## 
## [[4]][[2]]
##         b 
## 0.3486333 
## 
## [[4]][[3]]
##         a         b 
## 0.7011912 0.3822672 
## 
## [[4]][[4]]
##          a          c 
## 1.58556811 0.01362438 
## 
## [[4]][[5]]
##        a 
## 0.760291 
## 
## 
## [[5]]
## [[5]][[1]]
## [1] NA
## 
## [[5]][[2]]
##         b 
## 0.3442046 
## 
## [[5]][[3]]
##         a         b 
## 0.7689222 0.3513017 
## 
## [[5]][[4]]
##          a          c 
## 1.68157789 0.01206634 
## 
## [[5]][[5]]
##         a 
## 0.7528104 
## 
## 
## [[6]]
## [[6]][[1]]
## [1] NA
## 
## [[6]][[2]]
##         b 
## 0.5890211 
## 
## [[6]][[3]]
##         a         b 
## 0.6069758 0.6158401 
## 
## [[6]][[4]]
##          a          c 
## 2.39890712 0.02079203 
## 
## [[6]][[5]]
##        a 
## 1.375951 
## 
## 
## [[7]]
## [[7]][[1]]
## [1] NA
## 
## [[7]][[2]]
##         b 
## 0.3729202 
## 
## [[7]][[3]]
##         a         b 
## 0.6339700 0.4664684 
## 
## [[7]][[4]]
##        a        c 
## 1.050008 0.049412 
## 
## [[7]][[5]]
##        a 
## 0.831403 
## 
## 
## [[8]]
## [[8]][[1]]
## [1] NA
## 
## [[8]][[2]]
##         b 
## 0.3394035 
## 
## [[8]][[3]]
##         a         b 
## 0.6838920 0.3843947 
## 
## [[8]][[4]]
##          a          c 
## 1.50248587 0.01402991 
## 
## [[8]][[5]]
##         a 
## 0.7425137 
## 
## 
## [[9]]
## [[9]][[1]]
## [1] NA
## 
## [[9]][[2]]
##        b 
## 0.367975 
## 
## [[9]][[3]]
##         a         b 
## 0.6258393 0.4373232 
## 
## [[9]][[4]]
##          a          c 
## 1.54001720 0.01617884 
## 
## [[9]][[5]]
##         a 
## 0.7866161 
## 
## 
## [[10]]
## [[10]][[1]]
## [1] NA
## 
## [[10]][[2]]
##         b 
## 0.4064011 
## 
## [[10]][[3]]
##         a         b 
## 0.6237355 0.4739868 
## 
## [[10]][[4]]
##          a          c 
## 1.33141691 0.02715725 
## 
## [[10]][[5]]
##         a 
## 0.8508688
0 1 2 3 4
Arabic 30174.18 8221.94 8212.02 8999.49 8534.96
English 121969.37 39802.73 39270.11 41166.91 39323.03
Basque 14266.83 3608.46 3579.44 4107.36 3738.27
Greek 19991.77 5088.28 5069.89 5546.06 5201.95
Catalan 104292.47 23360.12 23357.73 24848.14 23593.28
Hungarian 38251.06 19381.66 19365.28 20604.62 21120.99
Chinese 180870.39 40216.32 37750.69 44273.01 44946.28
Italian 26583.88 6488.02 6427.65 7243.78 6677.05
Czech 150242.10 49508.13 49031.75 51945.91 50653.57
Turkish 30859.21 9929.75 9752.06 10864.78 10115.55
best_model <- lapply(model_AICs, function(x) which.min(as.vector(x)))
best_coefs <- lapply(1:length(language_list), function(i) best_params[[i]][[ as.numeric(best_model[i]) ]])